15. Exercise: Improvement, Class Imbalance

Model Improvement: Accounting for Class Imbalance

We have a model that is tuned to get a higher recall, which aims to reduce the number of false negatives . Earlier, we discussed how class imbalance may actually bias our model towards predicting that all transactions are valid, resulting in higher false negatives and true negatives . It stands to reason that this model could be further improved if we account for this imbalance!

To account for class imbalance during training of a binary classifier, LinearLearner offers the hyperparameter, positive_example_weight_mult , which is the weight assigned to positive (fraudulent data) examples when training a binary classifier. The weight of negative examples (valid data) is fixed at 1.

From the hyperparameter documentation on positive_example_weight_mult, it reads:

"If you want the algorithm to choose a weight so that errors in classifying negative vs. positive examples have equal impact on training loss, specify balanced ."

In the main exercise notebook, your exercises from defining to deploying an improved model looks as follows:

EXERCISE: Create a LinearLearner with a positive_example_weight_mult parameter

In addition to tuning a model for higher recall, you should add a parameter that helps account for class imbalance.

# instantiate a LinearLearner

# include params for tuning for higher recall
# *and* account for class imbalance in training data
linear_balanced = None

EXERCISE: Train the balanced estimator

Fit the new, balanced estimator on the formatted training data.

%%time 
# train the estimator on formatted training data

EXERCISE: Deploy and evaluate the balanced estimator

Deploy the balanced predictor and evaluate it. Do the results match with your expectations?

%%time 
# deploy and create a predictor
balanced_predictor = None

An important question here, when evaluating your model, is: Do the results match with your expectations? Much like in a scientific experiment it is good practice to start with a hypothesis that drives your idea for improving a model; if the trained model reacts in a different way than you expect (i.e. the model metrics are worse), it is worth revisiting your assumptions and approach.

Try to complete all these tasks, and if you get stuck, you can reference the solution video, next!

Shutting Down the Endpoint

Remember to delete your deployed, model endpoint after you finish with evaluation.